SVitchboard 1: Small Vocabulary Ta
نویسندگان
چکیده
We present a conversational telephone speech data set designed to support research on novel acoustic models. Small vocabulary tasks from 10 words up to 500 words are defined using subsets of the Switchboard-1 corpus; each task has a completely closed vocabulary (an OOV rate of 0%). We justify the need for these tasks, describe the algorithm for selecting them from a large corpus, give a statistical analysis of the data and present baseline whole-word hidden Markov model recognition results. The goal of the paper is to define a common data set and to encourage other researchers to use it.
منابع مشابه
SVitchboard 1: Small Vocabulary Tasks from Switchboard 1
We present a conversational telephone speech data set designed to support research on novel acoustic models. Small vocabulary tasks from 10 words up to 500 words are defined using subsets of the Switchboard-1 corpus; each task has a completely closed vocabulary (an OOV rate of 0%). We justify the need for these tasks, describe the algorithm for selecting them from a large corpus, give a statist...
متن کاملSVitchboard II and fiSVer i: high-quality limited-complexity corpora of conversational English speech
In this paper, we introduce a set of benchmark corpora of conversational English speech derived from the Switchboard-I and Fisher datasets. Traditional ASR research requires considerable computational resources and has slow experimental turnaround times. Our goal is to introduce these new datasets to researchers in the ASR and machine learning communities (especially in academia), in order to f...
متن کاملExpert Systems for Document Retrieval : Problems in Capturing Synonym Relations from the Experts '
A key problem in designing information retrieval systems is making the information contained in the system available quickly and easily, with a minimum of training or preparation required by the user. New lisers may be experts in their fields and have no problems with the mechanics of using the computer; yet they may still have diflicu1ty in retrieving information because they are not familiar ...
متن کاملOn Two Extensions of Abstract Categorial Grammars
Categorial Grammar • Type-theoretic grammar formalism for describing natural languages • Based on the implicative fragment of linear logic • Resource sensitivity • Simple but enough expressive • Mildly context-sensitive languages are generated by second-order ACGs (de Groote and Pogodalla 2004) On Two Extensions of ACGs – p.3/45 Abstract Categorial Grammar • Type-theoretic grammar formalism for...
متن کاملApplications of virtual-evidence based speech recognizer training
We present two applications of our previously proposed virtualevidence (VE) based speech recognizer training algorithm [1, 2]. The first relates to two-pass training where segmentations obtained during the first pass are used as VE to train the subsequent pass. We use the TIMIT phone and SVitchboard continuous speech recognition tasks to demonstrate the benefits of using VE based training in tw...
متن کامل